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1.   Introduction 

The  following  report  summarizes  data  analyses  and  statistical  research  conducted  at  Colorado 
State  University  under  cooperative  agreement  with  the  National  Park  Service  (NPS).  The  purpose 
of  this  investigation  was  two-fold:  to  characterize  historic  water  quality  in  the  Middle  Delaware 
Scenic  and  Recreational  River  (MDSRR),  and  to  develop  statistically-based  water  quality  criteria 
for  potentiaJ  use  in  establishing  non-degradation  standards  for  this  segment  of  the  Delaware  River. 

In  Section  2  of  this  report,  we  summarize  the  first  stage  of  the  investigation:  preliminary  data 
analyses,  including  evaluating  data  density,  producing  time  plots,  and  testing  for  trend  and  sea- 
sonality. In  Section  3,  we  discuss  the  second  stage,  conducted  after  presentation  of  the  results  of 
the  first  stage  to  our  co-investigators:  Mark  Flora,  Dave  Ryn,  and  Joel  Wagner  from  the  Water 
Resources  Division  of  the  National  Park  Service  (NPS-WRD),  Beth  Johnson  from  the  Delaware 
Water  Gap  National  Recreation  Area  (DEWA),  and  Dick  Albert  from  the  Delaware  River  Basin 
Commission  (DRBC)^.  Guided  by  the  available  statistical  evidence,  these  investigators  decided 
on  the  water  quality  parameters  for  which  the  Colorado  State  investigators  should  attempt  to  de- 
rive statistical  criteria,  and  decided  on  the  parameters  for  which  special  treatments  were  needed. 
In  addition,  we  briefly  outline  in  Section  3  some  of  our  investigations  into  the  possible  use  of  a 
multivariate  approach  to  these  data. 

Section  4  describes  in  detail  the  methods  used  in  the  derivation  of  statistical  water  quality  criteria, 
and  lists  the  criteria  themselves. 

Throughout  this  investigation,  we  at  Colorado  State  have  utilized  statistical  techniques  which 
we  consider  easy  to  use,  intuitively  appealing,  insensitive  to  outliers,  and  robust  to  violations  of 
basic  assumptions.  Much  attention  is  given  in  this  report  to  the  discussion  of  these  topics. 


^  The  authors  would  like  to  thank  the  co-investigators  mentioned  above,  as  well  as 
Nancy  Driver  (NPS),  for  their  insight  into  this  problem  and  for  their  reviews  of  this 
report.  Special  thanks  are  also  due  to  Todd  Kratzer  (DRBC),  without  whose  help 
we  would  have  no  data  and  no  report. 


2.  Preliminary  data  analyses 
2.1.  Evaluating  data  density 

This  data  set  includes  observations  from  four  sites  on  the  main  stem  Delaware  River — Port  Jcrvis, 
NY;  Montague,  NJ;  Stroudsburg,  PA;  and  Portland,  PA.  Figure  2.1.1  shows  the  Delaware  River 
Basin  with  these  four  sites  labeled  1,  2,  3,  and  4  respectively.  At  Port  Jervis,  there  are  United  States 
Geological  Survey  (USGS),  New  York  Department  of  Environmental  Conservation  (NYDEC),  and 
National  Park  Service  /  Delaware  River  Basin  Commission  (NPS/DRBC)  data;  at  Montague,  there 
are  USGS,  New  Jersey  Department  of  Environmental  Protection  (NJDEP),  and  NPS/DRBC  data; 
at  Stroudsburg,  there  are  Pennsylvania  Department  of  Environmental  Resources  (PADER)  and 
NPS/DRBC  data;  and  at  Portland  there  are  USGS  data. 

For  each  water  quality  parameter  at  each  agency /location  combination,  we  need  to  determine  if 
the  data  are  sufficiently  dense  for  meaningful  statistical  analysis.  Intuitively,  we  would  like  to  see 
a  high  sampling  rate — the  ratio  of  the  number  of  observations  to  the  range  of  time.  But  this  ratio 
contains  no  information  about  the  variability  of  the  parameter:  if  the  parameter  never  changes,  we 
need  not  sample  often;  if  it  varies  wildly,  we  can  hardly  sample  enough. 

The  statistic  we  propose  for  evaluating  data  density  is 

(2.1.1) 


{s/x){ttast  -  t first)' 

where  n  is  the  number  of  observations  in  the  data  set,  s  is  their  sample  standard  deviation,  x  is 
their  sample  mean,  ti^st  is  the  last  month  of  sampling,  and  tjirst  is  the  first  month  of  sampling. 
Note  that  s/x  is  the  sample  coefficient  of  variation,  a  unitless  measure  of  variability.  This  statistic 
is  basically  a  sampling  rate,  but  highly  variable  observations  have  less  weight  than  less  variable 
observations.  Since  the  water  quality  parameters  of  interest  are  all  nonnegative,  the  computed 
statistic  will  be  nonnegative,  with  low  values  indicating  low  data  density  and  high  values  indicating 
high  data  density. 

Determining  the  cutoff  value  for  deciding  when  data  density  is  too  low  for  the  data  analysis  to 
proceed  is  somewhat  arbitrary,  but  an  empirically  derived  value  of  2/3  seems  to  work  well.  This 
cutoff  has  intuitive  appeal  when  interpreted  as  a  minimum  sampling  rate  of  two  observations  per 
quarter  (three  months)  where,  of  course,  the  sampling  rate  is  weighted  by  the  variability  of  the 
parameter.  In  these  data  sets,  this  data  density  statistic  ranges  in  value  from  about  zero  to  about 
ten  and  a  half.  Roughly  forty  percent  of  the  data  sets  have  sufficient  data  density  by  this  criterion. 
No  further  statistical  analysis  was  done  for  data  sets  not  meeting  this  criterion.  •  • 
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2.2.  Time  plots 

For  all  data  sets  with  at  least  ten  observations,  we  make  time  plots  to  look  for  outliers,  regime 
shifts,  gaps,  trends,  and  periodicities.  These  time  plots  are  assembled  in  Appendix  A.  In  these  plots. 
the  value  of  the  parameter  appears  on  the  vertical  axis  and  the  month  of  sampling  appears  on  the 
horizontal  axis.  The  month  of  sampling  is  calculated  as  the  number  of  months  since  December  1959 
(January  1960  is  one).  Appendix  Table  A.l  is  a  "decoder"  for  interpreting  the  time  axes.  This 
time  axis  convention  was  adopted  to  simplify  plotting  and  to  provide  an  ordinal  time  scale  for  the 
calculation  of  the  rank-correlation  coefficient  discussed  below.  Also,  plotting  against  month  rather 
than  against  exact  date  smooths  the  plot  and  may  allow  the  eye  to  see  low  frequency  periodicities 
(for  example,  regularly  varying  seasonal  components  with  periods  of  months,  quarters,  or  years). 
Because  of  the  many  gaps  in  these  data  records,  high  frequency  periodicities  (for  example,  regularly 
varying  components  with  periods  of  weeks,  days,  or  hours)  are  not  possible  to  detect,  visually  or 
statistically. 

The  title  of  the  plot  includes  the  sampling  location — Port  Jervis,  Montague,  Stroudsburg,  or 
Portland — and  the  sampling  agency.  Here,  the  state  abbreviations  NY,  NJ,  and  PA  are  used  to 
represent  the  appropriate  state  sampling  agencies — the  NYDEC,  NJDEP,  and  PADER,  respectively. 
Each  data  point  in  the  body  of  the  plot  is  plotted  with  its  STORET  remark  code  ("B":  results 
based  upon  colony  counts  outside  the  acceptable  range;  "K":  actual  value  is  known  to  be  less  than 
value  given;  "L":  actual  value  is  known  to  be  greater  than  value  given;  "U":  material  was  analyzed 
for  but  not  detected)  if  it  has  one,  and  is  plotted  with  a  "*"  otherwise. 

These  time  plots  reveal  much  that  is  wrong  (from  a  statistical  point  of  view)  with  these  data 
sets.  Much  of  classical  statistics  proceeds  from  the  assumption  that  the  data  we  see  are  a  random 
sample — that  is,  the  observations  are  realizations  of  random  variables  which  are  independent  of 
one  another  and  which  are  identically  distributed.  This  is  clearly  not  the  case  for  many  of  these 
data.  The  assumption  of  identical  distributions  is  not  valid  for  the  data  shown  in  Figure  2.2.1, 
since  it  is  apparent  that  there  is  a  long-term  change  in  the  level  of  the  data,  which  we  call  a  positive 
trend.  It  is  not  valid  for  the  data  in  Figure  2.2.2,  since  there  is  a  sudden  change  in  the  level  and 
the  variability  of  the  data,  which  we  call  a  regime  shift.  Such  regime  shifts  may  be  caused  by 
changes  in  the  watershed  (a  sewage  treatment  plant  comes  on  line,  a  drought  occurs)  or  by  changes 
in  the  sampling  (a  new  lab  technique  is  developed,  reporting  standards  are  changed).  Figure  2.2.3 
appears  to  show  both  types  of  regime  shifts — the  high  values  in  months  60  to  80  are  apparently 
drought  related,  while  the  reporting  of  values  at  a  detection  limit  of  2  after  month  180  seems  to 
be  a  change  in  reporting  standards.  (Incidentally,  the  reporting  of  data  values  at  a  detection  limit 
is  one  example  of  what  statisticians  call  censoring;  there  is  plenty  of  censoring  in  these  data  sets, 
which  raises  more  problems.)  An  outlier  might  be  thought  of  as  a  regime  shift  one  time  unit  in 
length;  it  is  a  value,  usually  noticed  because  of  its  extreme  magnitude,  which  does  not  come  from 


the  same  distribution  as  the  other  values.  To  analyze  properly  all  data  sets  with  regime  shifts  is 
beyond  the  scope  of  this  project. 

The  assumption  of  identical  distributions  fails  again  for  the  data  in  Figure  2.2.4,  since  these  data 
show  a  regular,  oscillating  pattern,  which  we  wiU  in  general  call  a  periodicity,  but  which  we  will 
here  caU  seasonality  since  the  period  of  oscillation  corresponds  to  regular  divisions  of  the  time  a.\is, 
namely  months. 

Figure  2.2.4  also  shows  another  deviation  from  a  random  sample — lack  of  independence.  It  is 
clear  that  in  this  data  set,  low  values  tend  to  be  followed  by  low  values  and  high  values  tend  to 
be  followed  by  high  values;  since  successive  observations  carry  information  about  each  other,  the 
observations  are  dependent.  This  type  of  dependence  is  called  positive  serial  correlation  and  can  be 
described  as  a  "stickiness"  of  the  data.  The  type  of  dependence  indicated  by  low  values  tending  to 
be  followed  by  high  values  and  vice- versa  is  called  negative  serial  correlation  and  can  be  described 
as  a  "slipperiness"  of  the  data. 

When  data  are  collected  sequentially  in  time  and  the  random  sampling  assumptions  are  not 
met,  statisticians  often  rely  on  time  series  analysis  in  which,  classically,  trend  and  seasonalities 
are  estimated  and  removed  and  the  correlation  structure  of  the  leftovers  (or  residuals)  is  estimated 
and  used  in  a  predictive  model.  Classical  time  series  analysis,  however,  is  based  on  observations 
taken  at  regular  time  intervals,  since  this  allows  efficient  estimation  of  the  correlation  structure. 
The  data  sets  we  examine  are  very  irregularly  observed,  and  this  together  with  the  moderate  sizes 
of  the  samples  makes  estimation  of  the  correlation  structure  impractical. 

Our  analysis  must  therefore  take  a  middle  road  between  classsical  statistics  and  time  series 
analysis.  We  first  attempt  to  weed  out  data  sets  for  which  the  assumption  of  identical  distributions 
seems  unreasonable:  those  showing  trend,  seasonality,  and  regime  shifts  or  outliers.  We  attempt 
to  resolve  the  problems  of  these  data  sets,  if  possible.  Ultimately,  we  derive  statistical  criteria 
from  data  sets  showing  no  obvious  trend,  periodicities,  regime  shifts  or  outliers,  and  we  emphasize 
methods  that  are  insensitive  to  outliers  and  are  robust  to  violations  of  assumptions. 
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2.3.  Testing  for  trend 

In  addition  to  checking  visually  for  trend  in  the  time  plots,  we  compute  a  statistic,  Spearman's 
vdnk-corrcldlion  coefficient.,  to  quantify  trend.  In  simple  terms,  an  upward  trend  over  time  is  a 
tendency  for  small  values  of  the  parameter  to  occur  early  in  the  record  and  for  large  values  of 
the  parameter  to  occur  late  in  the  record.  By  ranking  the  parameter  values  and  looking  at  the 
correlation  between  these  ranks  and  their  positions  in  the  time  sequence,  we  get  the  rank- correlation 
coefRcient,  a  measure  of  trend.  This  coefficient  is  relatively  insensitive  to  outliers  (since  it  uses  ranks 
and  not  raw  data)  and  the  procedure  ignores  gaps  in  the  time  record. 

For  example,  if  we  saw  the  data  set  15,  13,  10,  7,  5;  replaced  these  raw  data  with  their  ranks  5, 
4,  3,  2,  1;  and  computed  the  correlation  between  the  ranks  and  their  positions  in  the  time  sequence 
(1,5),  (2,4),  (3,3),  (4,3),  (5,1);  we  would  get  negative  one — the  ranks  are  perfectly  negatively 
correlated  with  the  time  sequence.  We  might  conclude  that  there  is  evidence  of  a  downward  trend. 

If  we  assume  that  the  observations  in  a  data  set  are  independent,  we  can  use  the  rank-correlation 
coefficient  for  a  distribution-free  test  of  the  null  hypothesis  of  no  trend  vs.  the  two-tailed  alternative 
hypothesis  of  upward  or  downward  trend.  For  rather  small  numbers  of  observations  (as  low  as  ten), 
the  distribution  of  the  rank-correlation  coefficient  is  well- approximated  by  a  normal  distribution 
(Gibbons  1971),  so  the  test  mentioned  above  is  easy  to  perform.  It  may  be,  however,  that  the 
observations  within  a  data  set  are  not  independent.  Though  we  make  no  attempt  to  assess  the  effect 
of  dependence  on  the  distribution  of  the  statistic,  we  believe  that  the  rank-correlation  coefficient 
remains  a  reasonable  measure  of  trend,  and,  under  the  types  of  weak  dependence  that  we  might 
expect  to  see  in  water  quality  data,  the  approximate  distribution  of  the  statistic  should  not  be 
greatly  affected. 


2.4.   Testing  for  seasonality 

Data  sets  which  pass  the  visual  and  statistical  tests  for  trend  and  which  contain  at  lezist  40  obser- 
vations are  subjected  to  a  test  for  seasonality.  In  this  test,  we  are  looking  specifically  for  a  seasonal 
component  which  repeats  itself  quarterly  (we  require  a  minimum  of  40  observations  so  that  we  have 
about  10  observations  per  quarter,  enough  for  the  test).  The  possibility  of  quarterly  seasonal  compo- 
nents in  these  data  is  suggested  by  plots  of  average  monthly  stream  flow,  which  can  be  fairly  neatly 
divided  into  four  regimes  corresponding  to  winter  (December-January-February),  spring  (March- 
April-May),  summer  (June-July-August),  and  autumn  (September-October-November).  We  might 
expect  this  flow  pattern  to  be  somehow  reflected  in  the  values  of  the  water  quality  parameters. 

The  test  we  use  for  determining  the  presence  of  a  significant  quarterly  seasonal  component  is  the 
Kruskal-Wallis  test  for  the  equality  of  four  populations.  Again,  we  rank  the  set  of  observations, 
then  label  them  according  to  their  season.  If  in  fact  the  samples  from  the  four  seasons  are  from 
the  same  population,  the  labels  should  be  well  mixed.  If,  on  the  other  hand,  the  labels  are  not 
well  mixed — perhaps  the  winter  observations  are  all  clustered  at  the  high  end — the  hypothesis  of  a 
common  population  is  discredited. 

The  properties  of  the  Kruskal-Wallis  test  and  equivalent  tests — the  Wilcoxon  k-sample  test  and 
the  Mann- Whitney  U-test,  for  example — have  been  studied  extensively;  for  a  discussion  of  the 
details  and  a  guide  to  some  of  the  statistical  literature,  we  refer  the  interested  reader  to  Lehmann 
1971. 

Technically,  the  Kruskal-Wallis  test  assumes  that  the  four  samples  are  independent  and  that 
each  is  a  random  sample;  that  is,  the  observations  are  independent  and  identically  distributed. 
Independence  may  not  be  a  reasonable  assumption  to  make — -for  example,  observations  within  the 
same  season  may  be  correlated.  Assuming  that  the  correlation  between  observations  close  in  time 
is  positive,  we  can  see  the  effect  of  the  violation  of  the  assumption  of  independence  on  the  test 
statistic. 

Positive  correlation  will  cause  the  ranks  to  be  "sticky"  in  the  time  sequence;  high  observations 
will  tend  to  be  followed  by  high  observations,  low  observations  will  tend  to  be  followed  by  low 
observations.  This  stickiness  is  what  Kruskal-Wallis  interprets  as  evidence  against  a  common 
population.  Thus,  the  test  will  tend  to  reject  the  null  hypothesis  of  a  common  population  more 
often  than  it  should.  This  means  that  we  will  sometimes  take  the  conservative  approach  of  splitting 
the  data  into  seasonal  sets  when  it  is  in  fact  unnecessary. 

The  SAS  output  for  these  seasonality  tests  is  compiled  in  Appendix  B,  and  the  results  are 
summarized  in  Section  2.5. 


2.5.   Summary  of  preliminary  analyses 

The  chart  on  the  following  page  summarizes  the  results  of  preliminary  data  analyses  as  discussed 
in  Sections  2.1,  2.3,  and  2.4.  In  addition,  the  chart  reflects  any  changes  made  in  the  data  sets  (foi 
example,  removal  of  outliers)  as  discussed  in  Section  3.2.  The  reference  for  the  USGS  parameter 
codes  used  in  this  chart  is  USGS  1979. 

We  illustrate  the  use  of  this  chart  by  discussing  the  total  hardness  (00900)  data  sets.  Of  the  six 
agency/location  data  sets,  five  had  sufficient  data  density  for  further  analysis.  No  further  anal\'sis 
was  attempted  on  the  data  set  from  the  NJDEP  at  Montague,  NJ,  since  it  had  insufficient  data 
density  (•).  The  five  data  sets  with  sufficient  data  density  were  subjected  to  a  test  for  trend:  both 
the  NYDEC  data  from  Port  Jervis,  NY  (x  .37  <)  and  the  USGS  data  from  Montague  {<  .18  <) 
showed  some  evidence  of  (weak)  negative  trend.  The  three  remaining  data  sets,  which  showed  no 
evidence  of  trend,  were  subjected  to  a  test  for  seasonality:  the  data  set  consisting  of  99  observations 
taken  by  the  USGS  at  Port  Jervis  {•99)  and  the  data  set  consisting  of  44  observations  taken  by 
the  USGS  at  Portland,  PA  ("44)  both  showed  evidence  of  seasonality.  The  remaining  data  set, 
consisting  of  103  observations  taken  by  the  PADER  at  Stroudsburg,  PA,  had  sufficient  data  density 
for  analysis  and  showed  no  evidence  of  trend  or  seasonality. 


Symbol  Meaning 

•  Insufficient  data  density. 

-<  .50  <C  Evidence  of  negative  trend  of  magnitude  -.50. 

>  .50  >-  Evidence  of  positive  trend  of  magnitude  .50. 

•  75  Size  of  data  set  with  evidence  of  seasonality. 

75  Size  of  data  set  with  no  evidence  of  trend  or  seasonality. 
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NJDEP 

PADER 
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turbidity 

JTU 
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19 
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-<  .22  < 
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>  . 
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73 
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tot.  alkaJin. 

mg/^CaC02 

00410 
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10 

tot.  susp.  sol. 

mg/^ 
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■<  .49  < 

19 
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mg/^N 

00605 

49 

83 

• 

• 

tot.  amm. 

mg/^N 

00610 

22 

• 
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tot.  >f02  +  N03 

mg/^N 

00630 

22 

-<  .33  < 

• 

•  ^ 

>1 

•  60 

tot.  P 

rng/^P 

00665 

54 

• 

103 

diss,  ortho  P 

mg/^P 

00671 

23 

11 

TOC 

mgll 

006S0 

29 

12 

-;  .30  < 

tot.  hardness 

mg/^CaCOs 

00900 

.99 

•<  .37  < 

■<  .18  < 

103 

.44 

diss.  Ca 

mg/£-Ca 

00915 

.101 

• 

.139 

• 

»  .31  y 

diss.  Mg 

mg/£-Mg 

00925 

.104 

• 

•  139 

• 

>.29x 

diss.  Na 

mg/^Na 

00930 

»  .26  X 

• 

>  .56  >- 

. 

>  .56:^ 

diss.  K 

mg/M< 

00935 

»  .27  V 

• 

>  .42  V 

• 

75 

tot.  chloride 

mg/£-Cl- 

00940 

>  .27  y 

>  .44  y 

>  .38  y 

■91 

>  .64  y 

tot.  sulfate 

mg/^-SOf 

00945 

<  .31  « 

■<  .38  < 

.147 

2 

7 

>.27X 

»  .35  y 

tot.  arsenic 

/ig/^As 

01002 

-;.32< 

• 

■ 

tot.  cadmium 

^g/^Cd 

01027 

• 

• 

tot.  chromium 

/ig/^Cr 

01034 

23 

• 

• 

tot.  copper 

/xg/^Cu 

01042 

• 

• 

• 

tot.  iron 

/ig/^Fe 

01045 

• 

. 

. 

tot.  lead 

/ig/^Pb 

01051 

• 

. 

tot.  Mn 

^g/^Mn 

01055 

. 

. 

. 

tot.  zinc 

/xg/^Zn 

01092 

• 

• 

. 

fee.  col. 

31616 

• 

. 

. 

fee.  strep. 

31673 

. 

tot.  diss,  solids 

mgjl 

70301 

.82 

>  A2y 

• 

>  .61  V 

tot.  mere. 

^g/^Hg 
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• 

■ 

-<.47< 

Symbol     Meaning 


Insufficient  data  density. 
■<  .50  <     Evidence  of  negative  trend  of  magnitude  -.50. 
>  .50  y     Evidence  of  positive  trend  of  magnitude  .50. 
.75     Size  of  data  set  with  evidence  of  seasonality. 
75     Size  of  data  set  with  no  evidence  of  trend  or  seasonality. 


3.   Secondary  data  analyses 

On  July  G,  1989,  we  at  Colorado  State  met  with  co-investigators  Mark  Flora,  Dave  Ryn,  and  Joel 
Wagner  (NPS-WRD),  Beth  Johnson  (DEWA),  Dick  Albert  (DRBC),  and  with  Nancy  Driver  (NPS), 
to  present  the  results  of  our  preliminary  analyses  and  to  decide,  based  on  the  co-investigators" 
expert  judgement  and  on  the  available  statistical  evidence,  which  parameters  should  be  subjected 
to  further  analysis.  Sixteen  parameters  were  selected. 

3.1.   Data  combinations 

To  reduce  the  number  of  data  sets  we  would  need  to  analyze  and  to  increase  the  sample  sizes 
we  would  have  for  deriving  statistical  criteria,  we  attempted  whenever  possible  to  pool  the  data 
sets  corresponding  to  the  sixteen  surviving  parameters.  In  all  of  the  following  tests,  the  statistical 
criterion  used  was  the  Kruskal-Wallis  test  for  the  equality  of  k  populations. 

For  nonseasonal  data  sets,  we  first  tested  to  see  if  different  agencies'  data  at  the  same  site  were 
comparable.  Applying  the  Kruskal-Wallis  test,  we  pooled  the  observations  from  the  two  data  sets, 
then  ranked  them,  keeping  track  of  the  agency.  Again,  if  the  two  populations  generating  the  two 
data  sets  are  the  same,  then  the  agency  labels  in  the  pooled,  ranked  data  set  should  be  well-mixed. 
A  tendency  for  the  agency  labels  to  be  sticky — perhaps  the  USGS  data  are  consistently  Jower  than 
the  NJDEP  data — discredits  the  hypothesis  that  the  two  populations  are  identical.  The  discussion 
in  section  2.4  of  robustness  to  violation  of  the  independence  assumption  is  again  relevant  here. 

Next,  if  agency  data  could  be  combined,  we  tested  to  see  if  data  sets  from  different  locations 
along  the  river  could  be  combined.  We  cross-checked  the  results  of  all  these  tests  by  computing 
summary  statistics  and  plots  for  the  resulting  pooled  data  sets. 

For  seasonal  data  sets,  we  first  attempted  to  combine  the  data  from  different  seasons  at  a  partic- 
ular agency /location.  Often  spring  observations  were  quite  different  from  those  taken  during  the 
rest  of  the  year,  and  summer,  fall,  and  winter  data  could  reasonably  be  combined. 

Next,  we  attempted  to  combine,  on  the  basis  of  seasons  or  pooled  seasons,  different  agencies' 
data  at  the  same  site.  We  did  these  combinations  on  the  basis  of  the  least-pooled  data  set;  that  is, 
if  one  agency's  data  set  is  seasonal  and  the  other's  is  not,  then  we  try  to  combine  the  two  agencies' 
data  seasonally. 

We  then  attempted  to  combine,  on  the  basis  of  seasons  or  pooled  seasons,  data  from  different 
locations  along  the  river.  Again,  any  resulting  pooled  data  sets  were  summarized  (on  a  seasonal 
basis,  if  necessary)  with  statistics  and  plots. 
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3.2.   Miscellany 

In  tliis  section  we  document  outlier  treatment,  data  "scrubbing,"  and  any  other  special  procedures 
;i|)plied  to  these  data.  We  emphasize  the  treatments  of  the  sixteen  remaining  parameters. 

An  outlier,  as  discussed  earlier,  is  a  data  point  which,  because  of  its  extreme  magnitude — either 
too  small  or  too  large — is  recognized  as  not  being  from  the  same  population  (not  having  the  same 
distribution)  as  the  other  data  values.  This  may  be  caused  by  a  real  change  in  the  watershed  (a 
pollution  event,  for  example),  by  an  error  in  collecting  the  data  (a  faulty  probe,  for  example),  or 
by  an  error  in  recording  the  data.  Outliers  are  to  be  removed  from  data  sets  because  they  contain 
no  information  about  the  population  of  interest  and  because  they  can  adversely  affect  inferences 
about  that  population.  Our  rule  for  removing  outliers  is  a  very  conservative  one:  we  remove  only 
those  extreme  data  points  which  are  clearly  erroneous. 

Other  data  removed  from  these  data  sets  are  all  data  flagged  "J"  (estimated,  not  accurate)  in 
the  STORET  record.  Also,  a  few  of  these  data  sets  contained  single  observations  isolated  by  lags 
of  ten  years  or  more  from  the  rest  of  the  observations.  Typically,  this  was  a  single  observation  in 
1950  followed  by  no  other  observations  until  1960  or  later.  These  observations  were  removed,  again 
because  they  are  thought  to  contain  little  information  about  the  population  of  interest. 

Other  data  which  were  removed  were  found  in  NO2+NO3  data  from  both  the  NJDEP  at  Mon- 
tague, NJ  and  the  USGS  at  Portland,  PA.  In  each  set,  there  is  a  block  of  observations  censored  at 
a  detection  limit  which  is  high  relative  to  the  rest  of  the  data.  Such  a  censored  observation  tells  us 
only  that  the  observation  did  not  exceed  a  value  which  it  was  unlikely  to  exceed,  and  so  it  contains 
little  information.  These  censored  observations  do,  however,  bias  all  computed  statistics,  and  so 
they  are  removed.  It  is  important  to  note  that  no  other  censored  observations  are  removed. 

Data  were  also  removed  in  the  case  of  overlapping  data  between  agencies.  A  diagnostic  plot  made 
in  testing  whether  USGS  and  NYDEC  organic  nitrogen  data  could  be  combined  at  Port  Jervis 
revealed  a  suspicious  overlap;  apparently,  the  data  were  collected  under  a  cooperative  agreement, 
both  agencies  recorded  the  results  (sometimes  one  agency  recorded  the  date  the  sample  was  taken 
while  the  other  recorded  the  date  the  lab  results  returned),  and  both  data  sets  entered  the  STORET 
record.  In  this  case  we  removed  all  redundant  observations;  we  kept  only  one  observation  out  of  any 
pair  of  observations  which  were  identical  in  the  month,  the  year,  and  the  value  of  the  parameter, 
and  which  were  within  three  days  of  each  other  (to  account  for  the  discrepancy  between  sampling 
date  and  lab  date).  This  occurred  once  again  when  we  combined  USGS  and  NYDEC  specific 
conductance  data  at  Port  Jervis;  the  redundant  observations  were  removed. 

One  special  technique  apphed  in  analyzing  some  of  these  data  sets  is  splitting  the  time  series  at 
an  apparent  regime  shift  and  analyzing  the  before  and  after  data  sets  separately.  This  is  certainly 
justifiable,  but  determining  that  a  regime  shift  has  occurred  and  choosing  the  point  at  which  the 
regime  shifted  is  complex  from  a  statistical  point  of  view  and  is  necessarily  subjective  in  this 
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analysis.  VVe  relied  on  the  water  quality  expertise  of  the  co-investigators  from  the  NPS  and  DRBC. 

In  some  sets,  we  chose  to  ignore  indications  of  trend.  For  example,  at  Port  Jervis,  NY,  there  is  an 
indication  of  a  very  weak  positive  trend  (.16)  in  the  NYDEC  specific  conductance  data;  ignoring  this 
indication  is  not  particularly  worrisome,  since  it  should  have  little  impact  on  any  criteria  derived 
statistically  from  these  data. 

Two  parameters  receiving  special  treatment  were  dissolved  oxygen,  for  which  a  seasonal  analysis 
was  done  (summarized  in  Appendix  C),  and  pH,  for  which  summary  statistics  and  plots  were 
produced,  ignoring  trend  (summarized  in  Appendix  D). 
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3.3.   Multivariate  analyses 

VVc  made  some  attempt  to  explore  the  multivariate  relationships  among  these  data.  Of  course, 
we  looked  closely  at  the  relationship  between  each  water  quality  parameter  and  time.  We  also 
focnsscd  on  finding  a  viable  covariate — a  variate  which  is  closely  related  to  the  variate  of  interest, 
and  which  ideally  is  cheap  and  easy  to  obtain.  Knowledge  of  the  values  of  the  covariate  could  then 
be  used  in  drawing  inferences  about  the  variate. 

Stream  flow  would,  it  seems  to  us,  be  an  excellent  covariate.  It  is  sampled  continuously  and 
should,  for  obvious  physical  reasons,  be  correlated  with  water  quality  parameters.  Flow  values  are 
sometimes  recorded  in  the  STORET  record,  but  often  are  not;  we  were  forced  to  get  them  from 
another  source.  This  problem  made  flow  unworkable  as  a  covariate  for  the  purpose  of  this  project. 
We  did,  however,  make  use  of  flow  information  by  selecting  seasonal  periods  based  on  plots  of 
average  monthly  flow,  as  discussed  in  Section  2.4. 

\'Ve  also  examined  the  correlation  structure  of  several  parameters:  flow,  conductance,  calcium, 
sodium,  potassium,  magnesium,  chloride,  sulfate,  and  dissolved  solids.  These  parameters  are  phys- 
ically related — dissolved  sodium  enhances  conductance,  increased  stream  flow  dilutes  dissolved 
solids,  etc. — and  all  of  the  observations  of  the  chemical  parameters  came  from  the  same  water 
sample,  while  the  conductance  and  flow  readings  came  on  the  same  day.  These  relations  made 
a  multivariate  approach  natural.  Scatter  plots  and  correlation  matrices  from  this  analysis  are 
compiled  in  Appendix  E. 

The  correlation  structures  from  such  an  analysis  might  be  used  in  developing  standards;  the 
relationship  among  the  values  of  the  water  quality  parameters  could  be  helpful  in  determining 
when  a  violation  has  occurred,  since  the  value  of  one  parameter  could  be  cross-checked  against  the 
values  of  the  other  parameters.  Such  a  multivariate  analysis  for  irregularly  observed  time  series  with 
small  to  moderate  sample  sizes  would  be  an  interesting  topic  for  further  research,  but  is  beyond 
the  scope  of  this  project. 
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4.   Statistically-derived  water  quality  criteria 
4.1.   Distribution-free  quantile  estimation 

.'Vfter  characterizing  the  avaJlable  data  sets  and  determining  which  data  sets  can  be  combi nod. 
we  find  some  data  sets  from  which  statistical  criteria  can  be  derived.  These  data  sets  have  sufliciciit 
data  density  for  analysis  and  can  reasonably  be  modeled  as  identically  distributed  (or  identically 
distributed  within  seasons  for  data  showing  evidence  of  seasonality).  In  particular,  these  data  sets 
(or  the  seasonal  data  sets)  show  no  evidence  of  trend  or  other  dependence  on  time,  such  as  shifts 
in  regime.  In  this  section  we  will  assume  that  the  observations  within  a  given  data  set  (or  within 
the  corresponding  seasonal  data  sets)  are  identically  distributed. 

For  a  nonseasonal  data  set,  we  model  the  n  observations  Xi,X2,-  •  •  ,x„  as  realizations  of  random 
variables  Xi,X2,---  ,Xn,  where  X,  is  distributed  F,  i  =  1,2,...  ,n.  Here  F  is  the  cumulative 
distribution  function  (cdf)  defined  by 

(4.1.1)  F{x)  =  Prob[Xi<x], 

and  we  will  assume  that  F  is  continuous. 

Under  this  model,  a  parameter  of  interest  is  the  q^^  quantile,  an  unknown  constant  ^q  with  the 
property  that  if  Xj  is  distributed  F,  then,  by  (4.1.1), 

(4.1.2)  F{(q)  =  Prob[Xi<^q]  =  q,      for  0  <  5  <  1. 

This  parameter  is  of  interest  in  water  quality  criteria  because  for  a  sufficiently  high  value  of  ry,  say 
q  —  .90,  an  exceedance  of  ^^  by  a  water  sample  data  value  is  a  rare  event,  happening  only  ten 
percent  of  the  time  under  normal,  natural  conditions,  and  so  may  be  an  indication  of  a  nonnatnrai 
event. 

Since  ^q  is  unknown,  it  is  necessary  to  estimate  it  from  the  data.  Rather  than  giving  an  estimate 
of  ^q  itself  (a  point  estimate),  we  give  a  confidence  interval  estimate  of  ^q]  that  is,  we  choose  a  pair 
of  values  which  we  are  highly  confident  surrounds  <f^.  A  point  estimate  by  itself  is  practically  mean- 
ingless from  a  statistical  point  of  view,  since  there  is  no  way  of  knowing  its  precision.  Confidence 
interval  estimates  have  a  measure  of  precision  built  into  them. 

The  main  confidence  interval  estimation  procedure  we  use  makes  no  additional  assumptions  on 
F — it  may  be  normal,  lognormal,  gamma,  etc. — and  so  is  a  distribution-free  procedure  (often  called 
a  nonparametric  procedure,  though  this  is  misleading  since  ^q  is  a  parameter).  The  procedure  does 
assume  that  A'ljXo,...  ,Xn  are  independent  and  are  all  distributed  F. 

To  obtain  the  confidence  interval  estimator,  we  sort  the  n  data  values  in  ascending  order  to 
obtain  the  order  statistics  Xi:n,  X2:ni  ■  ■  ■  i^n-.ni  and  we  write 

(-1.1.3)  l<Prob[X,..n<^q<X,..n], 
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where  7  is  the  desired  level  of  confidence,  typically  .95.  Now  the  inequality  in  (4.1.3)  is  simply 

at  least  i  of  the  n  X,  's  are  <  ^g  and 
at  most  J  —  1  of  the  n  yY,'s  are  <  (q 

=  y^  Prob  [exactly  k  of  the  n  Xi's  are  <  if,] 


(4.1.4)  7  <  P^'ob 


k-i 


The  probability  to  be  computed  inside  the  sum  in  (4.1.4)  is  the  probability  of  exactly  k  "successes" 
(a  success  occurring  when  X,  <  (g)  in  n  independent  trials.  By  (4.1.2),  each  trial  has  probability 
of  success 

Prob  [X,  <  <ej  =  F{^g)  =  9, 

so  the  probability  to  be  computed  is  from  the  familiar  binomial  distribution  with  parameters  71  and 
7,  and  (4.1.4)  becomes 

i-i 

^. 

For  a  given  quantile  level  g,  sample  size  n,  and  confidence  level  7  <  I  —  q^ ^  we  can  choose  i  and 
j  to  make  the  inequality  in  (4.1.5)  true  (7  is  as  large  as  possible  when  i  =  1  and  j  =  n),  and  this 
choice  of  i  and  j  determines  our  confidence  interval  estimator  {Xi:n,Xj.n)- 

Now  there  are  different  choices  for  i  and  j  which  will  yield  an  interval  of  at  least  the  desired 
confidence.  To  see  this,  consider  the  following  example:  we  have  n  =  75  observations  and  we  would 
like  to  get  a  95%  confidence  interval  estimate  (7  =  .95)  for  the  unknown  constant  ^.90.  We  sort 
the  75  observations  and  choose  as  our  confidence  interval  estimator  the  smallest  observation  and 
the  largest  observation,  (Xi:75, X75;75).  We  can  be  highly  confident  that  this  estimator  will  cover 
the  constant  .f.go;  in  fact,  our  confidence  in  this  interval  estimator  would  be  99.96%.  But  obviously 
this  confidence  interval  will  be  too  wide  to  be  of  much  use.  A  more  reasonable  confidence  interval 
estimator  is  (X62:75, -^73:75),  in  which  we  can  place  confidence  of  96.80%.  Yet  another  confidence 
intervEd  estimator  of  at  least  the  desired  95%  confidence  is  {Xq^-jsjXj^-js),  in  which  we  can  place 
confidence  of  96.22%. 

Our  method  for  choosing  an  interval  among  alternatives  of  comparable  confidence  is  to  choose  j 
as  small  as  possible  for  the  given  confidence  level  and  then,  with  j  fixed,  choose  the  largest  value  of 
/'  which  gives  at  least  the  desired  confidence.  That  is,  we  choose  the  smallest  value  of  j  for  which 
(4.1.5)  can  be  true,  then  we  choose  the  largest  value  of  i  for  which  (4.1.5)  is  true. 

The  reason  for  choosing  the  confidence  interval  estimator  (X,:„,Xj:„)  in  this  way  is  that  for  all 
of  the  data  sets  in  this  study  from  which  statistical  criteria  can  be  derived,  high  values  of  the  water 
quality  parameter  are  of  most  concern.  Now  Xj:n,  the  upper  end  of  the  confidence  interval,  is  an 
upper  threshold  for  the  quantile:  we  can  state  with  high  confidence  that  the  quantile  of  interest  is 


smaller  than  Xj;n\  in  fact 


7   <  Prob  [Xi:n  <^g<  X,:n]  <  Prob  [^g  <  Xy.n] 
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for  i  >  1.  Thus  {Xi:n,Xj:n)  IS  a  IOO7  percent  confidence  interval  for  the  q^'^  quantile,  but  in 
addition,  Xj._n  is,  with  confidence  at  least  7,  the  smaJlest  possible  upper  threshold  for  the  q'^ 
(|uniililc. 

Returning  to  our  numerical  example  above,  we  illustrate  this  procedure.  First,  we  find  ihai  if 
we  choose  j  to  be  any  smaller  than  j  =  73,  then  it  is  impossible  to  get  an  interval  estimator  in 
which  we  can  place  at  least  95%  confidence;  thus  the  smallest  value  of  j  for  which  (4.1.5)  can 
be  true  is  j  —  73.  Now,  with  j  fixed  at  73,  we  find  the  largest  vaJue  of  i  which  makes  ('1.1.5) 
true.  (We  could  choose  i  =  1  and  the  inequality  in  (4.1.5)  would  be  true,  but  we  want  a  narrower 
confidence  interval.)  It  turns  out  that  the  largest  value  of  i  for  which  (4.1.5)  is  true  is  i  =  62.  Thus 
our  interval  estimate  is  (X62:75,-Y73:75),  in  which  we  have  96.80%  confidence.  Furthermore,  we  are 
highly  confident  that  if.go  is  smaller  than  Xr^-js. 

For  each  data  set,  we  report  a  set  of  confidence-interval  estimates:  estimates  of  the  .85,  .90,  and 
.95  level  quantiles.  These  estimates  are  at  a  level  of  at  least  .95,  if  possible;  estimates  are  reported 
only  if  at  a  confidence  level  of  at  least  .90.  For  data  sets  with  evidence  of  seasonality,  we  apply 
this  same  procedure  to  each  season's  data.  These  confidence  interval  estimates  are  summarized  in 
Section  4.1.1. 

This  procedure  is  not  affected  by  small  outlying  observations  or  by  censoring  on  the  lower  end 
{e.g.,  by  data  recorded  at  a  lower  limit  of  detection),  provided  the  number  of  these  "bad"  observa- 
tions is  less  than  i.  It  can  be  affected  by  large  outlying  observations.  Typically,  for  data  sets  of  the 
sizes  we  are  examining,  the  confidence  interval  for  if. 95  will  have  as  its  upper  limit  Xn-.n,  the  largest 
order  statistic,  which  will  be  an  outlier  if  any  large  outlying  observations  are  present  in  the  data. 
This  would  not  take  away  our  confidence  in  the  interval,  but  it  might  make  the  confidence  interval 
and  the  associated  upper  threshold  too  large  to  be  useful. 

This  procedure  is  inherently  robust  to  distributional  assumptions  since  it  does  not  need  any.  Its 
robustness  to  the  independence  assumption  is  a  more  difficult  question,  since  water  quality  data 
collected  over  time  will  often  not  be  independent,  but  will  have  positive  serial  correlation.  This 
question  can  be  rephrased  as  follows:  how  would  knowledge  that  these  data  are  not  independent 
but  are  in  fact  serially  correlated  affect  our  confidence  in  this  estimation  procedure? 

We  cannot  answer  this  question  definitively,  since  to  do  so  we  would  need  to  know  the  distribution 
of  the  data  and  the  structure  of  the  correlation,  which  we  do  not  know  and  cannot  estimate  (with 
much  confidence)  for  these  moderately-sized  samples,  which  have  many  missing  values.  We  can. 
however,  argue  that  the  procedure  remains  reasonable  and  our  confidence  should  not  be  greatly 
affected. 

First,  we  should  note  that  though  these  observations  may  have  some  serial  correlation,  it  is 
not  strong.  The  standard  measure  of  serial  correlation  is  the  sample  autocorrelation  at  lag  h 
(Brockwell  and  Davis  1987),  which  is  basically  the  sample  correlation  between  observations  h 
time  units  apart;  for  example,  the  autocorrelation  at  lag  one  is  the  correlation  between  successive 
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observalions.  Like  ordinary  correlations,  autocorrelations  range  in  vjilue  from  —1  to  1,  with  negative 
values  indicating  negative  serial  correlation  ("slipperiness")  and  positive  values  indicating  positive 
serial  correlation  ("stickiness").  To  test  statistically  whether  an  autocorrelation  at  some  lag  h  is 
significantly  different  from  zero,  we  use  Bartleit's  test,  which  compares  the  sample  autocorrelations 
to  the  bounds  ±1.96/ y/n  and  finds  them  significantly  different  from  zero  if  they  are  outside  these 
bounds. 

In  examining  the  time  plots,  the  most  obviously  "sticky"  time  series  is  dissolved  oxygen  at 
Stroudsburg,  and  its  sample  autocorrelation  at  lag  one  is  .59  (n  =  82  with  10  missing  values).  Note 
that  we  did  not  estimate  quantiles  for  dissolved  oxygen  but  instead  conducted  a  seasonal  analysis. 
A  few  other  data  sets  had  sufficiently  regular  sampling  to  allow  estimates  of  the  autocorrelations: 
chemical  oxygen  demand  data  from  the  USGS  at  Port  Jervis  had  no  statistically  significant  autocor- 
relations (n  =  68  with  13  missing  values);  total  phosphorus  data  from  the  PADER  at  Stroudsburg 
had  no  statistically  significant  autocorrelations  (n  =  116  with  13  missing  values);  total  hardness 
data  from  the  PADER  at  Stroudsburg  had  statistically  significant  autocorrelations  at  lags  one 
(.22)  and  two  (.30)  {n  =  116  with  13  missing  values);  and  total  chloride  data  from  the  PADER  at 
Stroudsburg  had  statistically  significant  autocorrelations  at  lags  one  (.26)  and  six  (.29)  (n  =  102 
with  11  missing  values).  Weak  autocorrelation  should  not  grossly  affect  our  estimation  procedure, 
since  it  is  inherently  very  cautious. 

Second,  asymptotic  theory  indicates  that  for  large  samples  and  m- dependent  processes  (processes 
in  which  observations  within  m  time  units  of  each  other  are  dependent  but  in  which  observations 
more  than  m  time  units  from  each  other  are  independent — a  plausible  model  for  water  cjuality 
data)  ,  the  q^^  sample  quantile  Xk-.n-,  where  ^  approximately  equals  5,  is  asymptotically  normal 
with  mean  ^^  (David  1970).  Since  our  confidence  intervals  always  cover  the  q^'^  sample  quantile 
and  are  quite  wide,  the  procedure  should  work  well  for  large  samples  and  m-dependent  processes, 
and  should  work  reasonably  well  for  these  moderately-sized  samples. 

Third,  we  conducted  a  brief  simulation  study  which  offers  some  encouragement.  Basically,  we 
simulated  1000  different  time  series,  each  of  length  75,  and  made  confidence  interval  estimates  and 
upper  threshold  estimates,  as  discussed  above,  of  if. 35,  ^.90,  and  ^.95.  The  simulated  time  series 
were  fairly  typical  of  those  we  see  in  these  data  sets:  they  are  bounded  below  by  zero,  skewed  to 
the  right,  and  have  weak  positive  serial  correlation.  None  of  the  empirical  confidence  estimates  was 
less  than  .90.  The  technical  details  and  results  of  the  simulation  are  summarized  in  Section  4.1.2. 


18 


4.1.1.  Distribution-free  quantile  estimates 

00095  Specific  conductance  (/z-mhos/cm  at  25°  C) 

Tho  .specific  conductance  data  from  the  USGS  and  the  NYDEC  at  Port  Jervis,  N^'  can  bo 
c.<>\\\\n\\('(\  on  a  seasonal  l)a.sis.  No  seasonal  pooling  h;i.s  boon  atlcmi)tod  since  llu^se  sea.sona.l  ilata 
sets  are  quite  large. 

To  get  an  idea  of  the  use  of  these  tables,  consider  the  first  line  of  the  table  below:  it  states  that 
there  were  68  readings  of  specific  conductance  by  the  USGS  and  the  NYDEC  at  Port  Jervis  during 
the  spring  months  March-April-May;  based  on  these  data,  we  can  be  95.16%  confident  that  the 
.85'-^  quantile,  <f.85,  lies  between  0:49:68  =  66.0  and  a;63:68  =  73.0. 

Season       n       q        i      Xi^n      j       Xj:n      Confidence 

Spring      68     .85     49     66.0     63      73.0  .9516 

.90     56     69.0     66      77.0  .9553 

.95     60     71.0    68     112.0         .9628 

Summer     78     .85     58     72.0     72      79.0  .9539 

.90     63     73.0     75      87.0  .9551 

.95     70     77.0     78      97.0  .9659 

Fall         73     .85     56     78.0     68      86.0  .9507 

.90     61     79.0     71      94.0  .9529 

.95     65     83.0     73     100.0  .9659 

Winter      45     .85     33     74.0     43      92.0  .9608 

.90     37     75.0     45     100.0  .9593 

.95     32     73.0     45     100.0  .9006 

The  next  two  sets  of  confidence  interval  estimates  are  based  on  data  sets  from  Montague,  NJ. 

The  data  from  the  two  agencies  sampling  there  could  not  be  combined.  USGS  conductance  data  at 

Montague,  NJ  show  evidence  of  seasonality,  but  the  summer,  fall,  and  winter  data  can  be  combined. 

Season(s)  n        q         i       Xi^n       j        Xj^n      Confidence 

Spring  36      .85      26      66.0      35       83.0  .9651 

.90      29      70.0      36       85.0  .9540 

Summer- Fall- Winter     118     .85      88      76.0     107      82.0  .9502 

.90      98      79.0     112      85.0  .9504 

.95     107     82.0     117     125.0  .9677 

The  conductance  data  from  the  NJDEP  at  Montague,  NJ  show  no  evidence  of  seasonality. 

Season      n       q        i      x,:n      _;'       Xj:n      Confidence 

All        53     .85     39     84.0     50     105.0  .9575 

.90     43     86.0     52     158.0  .9597 

.95     40     84.0     53     418.0  .9340 
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00335  Chemical  oxygen  demand  (xng/i) 

The  chemical  oxygen  demand  data  are  from  the  USGS  at  Port  Jervis,  NY.  There  are  many  Lied 
observations  in  this  data  set. 

Season      n       q       i      x,:„      j      Xj-.n     Confidence 

AU        66     .85     50      8.0      62     16.0  .9615 

.90     54     10.0     64     16.0  .9539 

.95     58     10.0     66     29.0  .9607 

00605  Organic  nitrogen  (mg/^-N) 

The  organic  nitrogen  data  are  from  the  combined  USGS  and  NYDEC  data  at  Port  Jervis,  NY. 
Note  that  overlapping  observations  have  been  deleted  from  the  USGS  set. 

Season       n        q        i       x,:^       j       Xj-n     Confidence 

AU        132     .85     103     .36      120      .56  .9564 

.90     110     .44      125      .59  .9533 

.95     119     .56     130      .77  .9569 

00610  Total  ammonia  (mg/^N) 

Because  of  an  apparent  regime  shift,  the  total  ammonia  data  from  the  NYDEC  at  Port  Jervis, 

NY  are  here  considered  for  months  t  >  305.  There  is  no  evidence  of  sea.sonality  in  this  truncated 

data  set,  but  there  is  evidence  of  a  negative  trend.  We  will  ignore  this  indication,  a  conservative 

procedure. 

Season      n       q       i      x,:„      j      Xj-n     Confidence 

All        66     .85    50     .031     62     .065         .9615 
.90     54     .040     64     .080  .9539 

.95    58     .048     66     .200         .9607 

We  also  split  the  total  ammonia  data  from  the  PADER  at  Stroudsburg,  PA,  at  month  i  -  305. 
The  data  for  i  <  305  show  no  evidence  of  seasonality,  but  show  some  evidence  of  positive  trend. 

Season      n       q        i      x,:„      j      Xj:n     Confidence 

All        59     .85     42     .070     55     .110  .9512 

.90     49     .080     58     .140  .9543 

.95     50     .090     59     .150  .9309 

The  data  for  at  Stroudsburg  for  t  >  305  show  no  evidence  of  seasonality  or  trend. 

Season      n       q       i      Xi-n      j      Xj-n     Confidence 

All        44     .85     32     .030     42     .050  .9596 

.90     36     .040    44     .080  .9623 
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00630  Total  NO2  +  NO3  (mg/i-N) 

Ignoring  the  negative  trend  in  the  NYDEC  data  at  Port  Jervis,  NY,  we  find  borderline  evidence 
of  seasonality.  The  following  data  are  from  the  seasonal  combination  of  USGS  and  NYDli^C-  data 
at  Port  Jervis,  NJDEP  data  at  Montague,  and  USGS  data  at  Portland.  Note  that  some  censored 
observations  have  been  deleted  from  both  the  Montague  and  Portland  data. 

Season       n       q        i      Xi^n      J      ^j-.n     Confidence 

Spring      61     .85     45      .41      57      .54  .9547 

.90     47      .43      59      .76  .9500 

.95     53      .50     61     1.80  .9530 

Summer     71     .85     53      .34     66      .47  .9579 

.90     59      .36     69      .58  .9549 

.95     63      .40     71     1.14  .9650 

Fall        60     .85     44     .36     56      .53  .9517 

.90     50      .45     59      .65  .9520 

.95     52      .48     60     1.03  .9511 

Winter      27     .85     19      .50     27      .74  .9738 

.90     14      .43     27      .74  .9419 

00665  Total  phosphorus  (mg/£-P) 

The  following  estimates  are  from  the  USGS  total  phosphorus  data  at  Port  Jervis. 

Season      n       q        i      x,:n      j      Xj._n     Confidence 

AU        54     .85     40     .04     51      .11  .9591 

.90     44     .05     53      .12  .9598 

.95     41      .04     54      .14  .9373 

These  estimates  are  from  the  PADER  total  phosphorus  data  at  Stroudsburg,  PA. 

Season      n        q       i      x,:n       j      Xj._n     Confidence 

All        103     .85     78     .06      94       .10  .9520 

.90     83     .07      98       .14  .9508 

.95     93     .10     102      .17  .9533 

00900  Total  hardness  (mg/^CaCOa) 

The  following  confidence  interval  estimates  are  based  on  the  PADER  data  from  Stroudsburg, 
P.A.  These  data  show  no  seasonality,  but  we  should  note  that  the  USGS  hardness  data  at  Portland 
do  show  seasonality;  this  data  set  is,  however,  too  small  to  analyze  on  a  seasonal  basis. 
Season       n        q        i      x,:n       j       Xj-n     Confidence 

All        103     .85     78     28.0      94      34.0  .9520 

.90     83     30.0      98      40.0  .9508 

.95     93     32.0     102     52.0  .9533 
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00915  Dissolved  calcium  (mg/£-Ca) 

Summer,  fall,  and  winter  calcium  data  can  be  pooled  in  both  the  USGS  at  Port  Jervis  set  and 
the  USGS  at  Montague  set,  and  the  two  resulting  data  sets  can  be  combined.  Furthermoic,  the 
spring  data  sets  at  the  two  sites  can  be  combined. 

Season(s) 
Spring 


n 

9 

i 

•^i-.n 

J 

•^  j:n 

Confidence 

58 

.85 

44 

7.0 

55 

8.4 

.9595 

.90 

48 

7.3 

57 

17.0 

.9562 

.95 

44 

7.0 

58 

18.0 

.9490 

182 

.85 

140 

8.0 

163 

8.6 

.9507 

.90 

153 

8.1 

171 

9.0 

.9535 

.95 

164 

8.6 

178 

9.5 

.9503 

00925  Dissolved  magnesium  (mg/^-Mg) 

There  is  evidence  of  seasonality  in  the  USGS  magnesium  data  at  both  Port  Jervis  and  Montague. 

No  pooling  of  seasons  is  possible  and  the  two  data  sets  cannot  be  combined  seasonally.  The  following 
estimates  are  based  on  the  USGS  data  at  Port  Jervis. 

Season       n       q       i      Xi:^      j      Xj.^  Confidence 

Spring      25     .85     18      1.3     25      1.4  .9573 

.90     13      1.2     25      1.4  .9282 


Summer     26 


.85     19 
.90     13 


1.5 
1.3 


26 
26 


2.1 
2.1 


.9533 
.9354 


Fall         29     .85     21      1.7     29      1.9  .9687 

.90     21      1.7     29      1.9  .9513 

Winter      24     .85     17      1.3     24      1.6  .9600 

.90     13      1.3     24      1.6  .9202 

These  next  estimates  are  based  on  the  USGS  magnesium  data  at  Montague 

Season       n       q        i      Xi^n      j 


Xj:n     Confidence 


Spring      33     .85  23  1.4  32  2.1  .9612 

.90  26  1.6  33  2.2  .9550 

Summer     39     .85  29  1.7  38  2.1  .9611 

.90  31  1.8  39  2.1  .9704 

Fall        38     .85  28  1.8  37  2.4  .9633 

.90  30  1.9  38  3.9  .9707 

Winter      29     .85  21  1.5  29  2.1  .9687 

.90  21  1.5  29  2.1  .9513 
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00940  Total  chloride  (mg/£-Cr) 

The  following  estimates  are  from  the  FADER  total  chloride  data  at  Stroudsburg,  PA. 


Season 

n 

9 

i 

•'^i-.n 

J 

•^  j:n 

Confidence 

All 

91 

.85 

70 

9.0 

84 

12.0 

.9575 

.90 

74 

9.0 

87 

13.0 

.9528 

.95 

83 

11.0 

91 

18.0 

.9522 

00945  Total  sulfate  (mg/^-SOf ) 

The  sulfate  data  set  from  the  USGS  at  Montague  shows  evidence  of  seasonality;  the  sulfate  clatn 

set  from  the  NJDEP  at  the  same  site  is  too  small  {n  =  27)  to  check  for  seasonality  and  is  too 

small  to  determine  if  it  can  be  combined  with  the  USGS  data.  We  give  separate  confidence  interval 

estimates  for  the  two  data  sets,  seasonally  for  the  USGS  data  and  nonseasonally  for  the  NJDEP 

data.   Note  that  there  are  problems  with  ties  in  these  data.  The  first  set  of  estimates  is  based  on 

the  USGS  data. 

Season       n       q        i      x,:„      _;      Xj.^n     Confidence 

Spring      35     .85     25     12.0     34     16.0  .9647 

.90     28     13.0     35     17.0  .9550 

Summer    40     .85     26     11.0     38     13.0  .9510 

.90     32     12.0     40     16.0  .9697 

Fall        40     .85    26     12.0    38     14.0         .9510 
.90     32     12.0     40     16.0  .9697 

Winter      32     .85     22     14.0     31  16.0  .9581 

.90     31     15.0     32  16.0  .9540 

These  next  estimates  are  based  on  the  NJDEP  data. 

Season      n       q        i      x,:^      _;'  Xj:n  Confidence 

All        27     .85     19     10.0     27     21.0  .9738 

.90     14     10.0     27     21.0  .9419 

70301  Total  dissolved  solids  (mg/£) 

The  dissolved  solids  data  from  the  USGS  at  Port  Jervis,  NY  show  evidence  of  seasonality.  Sum- 
mer, fall,  and  winter  data  can  be  combined,  forming  a  data  set  with  58  observations.  The  spring 
data  set  contains  24  observations. 

Season(s)  n       q        i      Xi:n      j      Xj-.n     Confidence 

Spring  24     .85     17     36.0     24     45.0  .9600 

.90     14     35.0     24     45.0  .9202 

Summer-Fall-Winter     58     .85     44     41.0     55     43.0  .9595 

.90     48     42.0     57     45.0  .9562 

.95     45     41.0     58     47.0  .9490 
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4.1.2.  Simulation  methods  and  results 

A  common  distributional  model  for  water  quality  data  is  the  log  normal,  a  distribution  which 
is,  like  much  water  cjuality  data,  bounded  below  by  zero  and  skewed  to  the  right.  In  the  following; 
study,  we  simulated  values  xi,X2, . . .  ,X75  of  a  time  series  {XJ,  t  =  1,2, .. .  ,75,  where  the  A'j's  are 
identically  distributed  log  normaJ  and  where  the  succesive  observations  are  not  independent  but 
are  instead  positively  correlated.  The  number  of  observations,  n  =  75,  was  chosen  as  fairly  typical 
of  the  size  of  data  sets  we  were  examining. 

To  obtain  log  normal  variates  with  the  desired  correlation  structure,  we  first  simulated  a  series 
Zo,  Zi.Zo,. . .  , 2^75  distributed  independently  and  identically  normal  with  mean  zero  and  variance 
one.  We  then  formed  a  moving  average  of  order  one  defined  by 

Yt  =  Zt  +  (l>Zt-u     fort  =  1,2,...  ,75, 

where  o  =  .466974.  This  is  one  simple  way  of  introducing  dependence  in  a  model.  Now  }'(  is 
distributed  normal  with  mean  zero  and  variance  1  +  4'^,  and  defining 

Xt  =  e^',     for  f  =  1,2,...  ,75, 

we  have  that  the  X^'s  are  distributed  log  normal  with  parameters  zero  and  1  +  0^.  Straightforward 
calculations  show  that  the  correlation  between  successive  values,  Covr{Xt,Xt-i),  or  autocorrelation 
at  lag  one,  is  .25. 

It  is  also  straightforward  to  compute  the  true  values  of  the  .85,  .90,  and  .95  quantiles: 

^85  =  3.13878 
^90  =  4.11422 
Gs  =  6.14427 

We  chose  confidence  intervals  (and  the  corresponding  upper  thresholds)  as  detailed  in  Section  <l.l 
and  compared  these  to  the  true  values.  (In  choosing  these  confidence  intervals  we  wanted  confi- 
dence levels  of  at  least  .95,  and  we  obtained  confidence  levels,  in  a  random  sampling  setting,  of 
.9632  for  ,^.85,  of  .9680  for  ^.90,  and  of  .9662  for  (f.95.)  By  repeating  this  simulation  1000  times — that 
is,  simulating  1000  different  time  series  of  length  75,  finding  the  confidence  intervals  for  eacli,  and 
comparing  the  confidence  intervals  to  the  true  values — we  obtained  empirical  estimates  of  the  confi- 
dence we  could  place  in  this  procedure  when  applied  to  log  normal  data  with  weak  autocorrelation. 
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The  following  results  were  obtained: 


Empirical  confidence  in  confidence  interval  for  if.gs 

Empirical  coiifidcncc  in  confidence  interval  for  (^.90 

Empirical  confidence  in  confidence  interval  for  ^.95 

Empirical  confidence  in  upper  threshold  for  ^.35 

Empirical  confidence  in  upper  threshold  for  ^.90 

Empirical  confidence  in  upper  threshold  for  (f.95 


.921 
.952 
.944 
.957 
.978 
.963 


Estimating  the  standard  error  of  7,,  the  empirical  confidence  in  our  confidence  interval  for  ^g,  with 


7c?(l  -  iq) 
1000      ' 

we  see  statistical  evidence  that  the  confidence  actually  attained  when  this  procedure  is  applied 
to  log  normal  data  with  weak  autocorrelation  is  lower  than  the  confidence  expected  in  a  random 
sampling  setting.  The  statistical  evidence  also  suggests  that  the  confidence  actually  attained  is 
higher  than  .90. 
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4.2.   Paranietric  quantile  estimation 

A  few  of  these  data  sets  lend  themselves  to  parametric  modeling:  by  looking  at  frequency  dia- 
grams, (ixianlile-quanlile  plots,  and  summary  statistics  for  the  raw  data  or  for  suitably  transformed 
data,  we  may  be  able  to  guess  a  distribution  function  Fg  for  the  data;  here  the  distribution  function 
Fe  depends  on  a  vector  of  unknown  constants,  the  parameter  vector  0.  We  then  get  an  estimate  0 
of  the  parameter  vector  tf,  express  the  quantile  of  interest  as  a  function,  say  /i,  of  the  parameters: 

and  estimate  the  quantile  by 

Combined  summer,  fall,  and  winter  dissolved  solids  data  from  the  USGS  at  Port  Jervis,  NY 
might  reasonably  be  modeled  as  realizations  from  a  normal  distribution.  Furthermore,  three  of 
these  data  sets — USGS  total  phosphorus  data  at  Port  Jervis,  PADER  total  phosphorus  data  at 
Stroudsburg,  and  USGS  chemical  oxygen  demand  data  at  Port  Jervis — might  reasonably  be  mod- 
eled as  realizations  from  a  log  normal  distribution,  which  is  equivalent  to  saying  that  the  natural 
logarithms  of  these  data  might  reasonably  be  modeled  as  realizations  from  a  normal  distribution  (X 
is  distributed  log  normal  (ij.,ct')  if  and  only  if  InX  is  distributed  normal  (/i,(7^)).  Here  0  =  (i.t,a-). 

Criteria  for  determining  if  the  normal  is  a  reasonable  distributional  model  include  inspection 
of  the  frequency  diagram  (this  gives  an  idea  of  the  shape  of  the  distribution  and  should  resem- 
ble the  familiar  bell-shaped  curve),  calculation  of  the  Shapiro-Wilk  statistic,  and  inspection  of  the 
quantile- quantile  plot  (more  commonly  called  the  normal  probability  plot  when  testing  the  hypoth- 
esis of  normality).  The  Shapiro-Wilk  statistic  tests  the  hypothesis  that  the  data  are  a  random 
sample  from  a  normal  distribution  (SAS  Procedures  1985).  The  normal  probability  plot  is  a  plot  of 
order  statistics  against  theoretical  quantiles  from  a  standard  normal  distribution  (/x  =  0,  a"  =  1). 
Specifically,  it  is  a  plot  of  the  ordered  pairs  (x,:n,  $~^(^)),  where  $~^  is  the  inverse  of  the  standard 
normal  cdf  $.  If  the  data  are  in  fact  well-modeled  by  a  normal  distribution,  then  we  would  expect 
the  order  statistic  a;,;„,  which  is  the  -  sample  quantile,  to  be  a  fairly  good  estimate  of  $~^(-^), 
which  is  the  ^      theoretical  quantile,  and  so  we  would  expect  the  plot  to  look  like  a  straight  line. 

Our  method  of  estimation  is  maximum  likelihood,  which  is  a  method  of  choosing  0  =  {fi,a-)  so 
that,  loosely  speaking,  the  probability  of  seeing  the  data  we  in  fact  saw  is  as  high  as  possible.  The 
maximum  likelihood  estimate  (MLE)  of  /x  is  the  sample  mean,  x,  in  the  normal  case,  and  is  the 
sample  mean  of  the  logged  data, 


-Vlnxi, 


n 


in  the  log  normal  case.  In  the  normal  case,  the  MLE  of  a-  is  ^^— ^5^,  where  s^  is  the  sample  variance, 
and  in  the  log  normal  case,  the  MLE  is  ^^^— ^  times  the  sample  variance  of  the  logged  data. 
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Now,  in  the  normaJ  case, 

(,  =  /i  +  z,a, 

where  Zq  is  a  "look-up"  value  from  the  standard  normal  table;  it  is  in  fact  the  q^'^  quantilc  for  a 
standard  normal  distribution.  We  estimate  ^g  by 

and  this  estimate  is  again  an  MLE  by  a  well-known  result  called  the  invariance  property  of  MLE's 
(MGB  1974).  In  the  log  normal  case, 

and  so  we -estimate  <fq  by 

and  this  estimate  is  also  an  MLE  by  the  invariance  property.  We  now  have  estimates  of  the  quantiles. 
but  we  would  like  some  idea  of  the  precision  of  these  estimates.  We  will  discuss  a  method  by  which 
we  can  construct  confidence  intervals  for  the  quantiles  of  interest  based  on  the  MLE's. 

A  fundamental  result  in  the  theory  of  maximum  likelihood  estimation  tells  us  that  the  estimator 
(f,  is  asymptotically  normal  (that  is,  approximately  normal  for  a  large  sample  size  n)  with  mean  ^q 
(which  is  nice)  and  with  variance  given  by  a  rather  complicated  function  involving  a  matrix  known 
as  the  Fisher  information  matrix.  We  refer  the  interested  reader  to  MGB  1974.  For  the  normal 
case,  the  asymptotic  standard  error  (ASE)  reduces  to 


and  we  estimate  the  ASE  by 


For  the  lognormal  case,  the  ASE  is 


and  we  estimate  the  ASE  by 
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An  approximate  95  percent  confidence  interval  for  ^q  in  either  the  normal  or  the  lognormal  case  is 
then 

^q  ±  1.96  (Est.  ASE). 

Parametric  confidence  intervals  are  summarized  in  Section  4.2.1. 
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The  advantages  of  parametric  modeling  are  compelling.  For  small  and  moderate  sample  sizes,  wc 
can  obtain  tight  confidence  intervals  (relative  to  the  distribution-free  procedure)  at  any  confidence 
level  for  any  quantile.  In  particular,  we  can  extrapolate  to  obtain  estimates  of  quantiles  which 
are  outside  the  range  of  the  data,  something  we  cannot  do  with  the  distribution-free  approach;  for 
example,  in  these  data  sets  we  cannot  get  a  distribution-free  confidence  interval  estimate  of  <f  99, 
but  we  can  get  a  parametric  confidence  interval  estimate  of  it. 

Clearly  this  procedure  is  not  as  easy  to  use  as  the  distribution-free  approach,  since  it  requires 
many  steps  in  model  selection  before  quantiles  may  be  estimated.  The  strong  results  attainable  with 
this  parametric  approach  come  at  the  price  of  strong  assumptions:  here  the  data  are  assumed  to  be  a 
random  sample  and  their  distribution  is  assumed  to  be  known  to  within  the  value  oiO.  These  strong 
assumptions  make  this  approach  inherently  less  robust  than  the  distribution-free  approach.  Unlike 
the  distribution-free  approach,  the  parametric  approach  is  complicated  by  censored  observations 
(though  this  was  not  a  factor  in  the  data  sets  we  examined).  In  addition,  the  parametric  approach 
can  be  sensitive  to  outliers;  in  particular,  estimates  based  on  the  sample  mean  and  the  sample 
variance  can  be  distorted  by  outliers.  For  these  reasons,  the  parametric  approach  should  be  used 
with  caution  in  data  sets  such  as  these,  and  the  parametric  approach  was  not  used  extensively  in 
this  project. 
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4.2.1.   Parametric  quantile  estimates 

00335  Chemical  oxygen  demand  (mg/£) 

The  chemical  oxygen  demand  data  from  the  USGS  at  Port  Jervis,  NY  show  no  evidence  of 
seasonality.  There  are  65  observations  in  this  data  set,  none  of  which  is  censored.  These  data  can 
be  modeled  with  a  log  normal  distribution.  The  p- value  for  the  Shapiro- Wilk  statistic  on  the  logged 
data  is  p  =  .1151,  meaning  we  do  not  reject  the  hypothesis  of  log-normality  of  the  raw  data.  The 
frequency  diagram,  boxplot,  and  normal  probability  plot  for  the  logged  data  in  the  following  SAS 
output  indicate  that  a  log  normal  model  may  be  appropriate  for  the  raw  data. 
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Maximum  likelihood  estimates  of  the  parameters  are  fi  =  1.9005  and  a^  =  .2472.  Thus  we  have 
the  following  quantile  estimates  and  confidence  intervals: 

q  ^q         Estimated  ASE    Approx.  95%  Confidence  Interval 
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00665  Total  phosphorus  (mg/£-P) 

The  Lolal  phospliorus  data  from  tlie  USGS  at  Port  Jervis,  NY  show  no  ovidcnco  of  seasonality. 
There  are  64  observations  in  this  data  set,  one  of  which  is  censored.  These  data  can  be  niotleU>d 
with  a  log  normal  distribution.  The  p-value  for  the  Shapiro- Wilk  statistic  on  the  logged  data  is 
p  -  .2380,  meaning  we  do  not  reject  the  hypothesis  of  log-normality  of  the  raw  data.  The  freciuency 
diagram,  boxplot,  and  normal  probability  plot  for  the  logged  data  in  the  following  SAS  output  are 
also  in  agreement  with  a  log  normal  model  for  the  raw  data. 


VlrliDll'Lhf 


UNI««RIATE   PROCEDURE 

St»«   L»af 

Boaolot 

-19    7 

0 

-:o  : 

0 

-22    11 

-24   50 

-24  i 

-23   93 

-30    7000 

_2"?    ^fy 

-34   911111144 

-34   771 

«— .— « 

-33   441111111*'''*^ 

15 

-40   44 

-42    40 

-44 

,. 

-44    11111 

-48    3 

-50 

-52   ' 

-54    2 

1 
1 

0 

nuitipir  sua 

Luf    by 

10««- 

Mar»al 

Probibility  Plot 

♦•♦«« 

»»  ♦♦ 

«♦♦» 


«♦«♦» 


Maximum  likelihood  estimates  of  the  parameters  are  (i  =  -3.6148  and  a-  =  .5298.    Thus  we 
have  the  following  quantile  estimates  and  confidence  intervals: 

q         (f,        Estimated  ASE    Approx.  95%  Confidence  Interval 
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The  total  phosphorus  data  from  the  PADER  at  Stroudsburg,  PA  show  no  evidence  of  seasonality. 
There  are  103  observations  in  this  data  set,  one  of  which  is  censored.  These  data  can  be  modeled 
with  a  log  normal  distribution.  The  p-value  for  the  Shapiro- Wilk  statistic  on  the  logged  data  is 
p  —  .0484,  a  borderhne  value;  we  will  not  reject  the  hypothesis  of  log-normality  of  the  raw  data. 
The  frequency  diagram,  boxplot,  and  normal  probability  plot  for  the  logged  data  in  the  following 
SAS  output  indicate  that  a  log  normal  model  may  be  appropriate  for  the  raw  data. 


V«rwDU«LNf> 


UNIVARIATE 

PROCEDURE 

Sti«  ttlf 

, 

-14   71 

2 

-19    7703 

4 

-20 

-:;  00000011 

8 

-21  jo;iii 

4 

-24    44444 

5 

-29    5111111111 

10 

-30  0000000000000000 

14 

13 

-34  4iumiiiniiii 

14 

-34 

-38  nuiuniiii 

13 

-40  7 

1 

-4: 

-44 

-44  im 

4 

.1-1 »-- .—  ^.. — « ^ 

Hultiolr    St».L>lf    tiy    10>I-1 

NocmI    ProHibiUty  Plot 


xxxx%t% 


«♦♦♦ 


-4.7»«»    t    S    S 

-Z 


Maximum  likelihood  estimates  of  the  parameters  are  /i  =  —3.1155  and  b^  —  .3969.    Thus  wc 
have  the  following  quantile  estimates  and  confidence  intervals: 

q         ^q        Estimated  ASE     Approx.  95%  Confidence  Interval 
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70301  Total  dissolved  solids  (mg/^) 

The  dissolved  solids  data  from  the  USGS  at  Port  Jervis,  NY  show  evidence  of  seasonality.  Sum- 
mer, fall,  and  winter  data  can  be  combined,  forming  a  data  set  with  58  observations.  Tlioro  aro 
no  censored  observations  in  this  data  set.  These  data  can  be  modeled  with  a  normal  distribuiioii. 
The  7> value  for  the  Shapiro- Wilk  statistic  is  p  =  .2815,  meaning  we  do  not  reject  the  hypothesis 
of  normality.  The  frequency  diagram,  boxplot,  and  normal  probability  plot  in  the  following  S.'\S 
output  are  also  in  agreement  with  a  normal  model. 
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Maximum  likelihood  estimates  of  the  parameters  are  fi  =  38.9828  and  a"  =  10.5342.  Thus  we 
have  the  following  quantile  estimates  and  confidence  intervals: 

q  ^q  Estimated  ASE     Approx.  95%  Confidence  Interval 
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5.    Postscript 

VVc  first  reempliasize  that  these  are  messy  data  sets — they  have  missing  values  and  censored  val- 
ues, regiiiie  shifts  and  outliers,  trends  and  periodicities,  and  they  are  generally  sparse  and  sporadic. 
Evaluating  the  amount  of  information  available  in  such  data  sets  is  not  easy,  but  tools  like  the 
data  density  statistic  in  Section  2.1  can  help.  This  statistic  might  be  used  as  a  rough  guide  for 
determining  sampling  rates  in  future  monitoring. 

Extracting  the  information  available  in  these  messy  data  sets  is  also  problematic.  We  note  here 
that  this  project  should  be  considered  a  first-stage  statistical  analysis,  which  is  necessarily  a  bit 
informal  and  coarse. 

In  particular,  we  did  not  attempt  to  do  any  kind  of  formal  sequential  analysis.  This  investigation 
has  naturally  been  iterative  and  conditional:  if  a  data  set  has  sufficient  density,  then  test  it  for 
trend;  if  it  shows  no  evidence  of  trend,  then  test  it  for  seasonality;  etc.  At  each  stage,  we  start  with  a 
probabilistically  clean  slate — we  a^ssume  the  truth  of  whatever  conclusion  we  drew  in  the  previous 
stage,  ignoring  the  uncertainty  of  that  conclusion.  A  sequential  analysis  would,  at  each  stage, 
reflect  the  uncertainty  of  all  previous  stages  by,  for  example,  changing  significance  probabilities 
associated  with  tests;  these  probabilities  would  be  conditioned  on  the  results  of  previous  tests. 
Such  an  analysis  would  be  complex. 

One  other  topic  which  might  be  addressed  by  a  more  refined  analysis  is  that  of  measurement 
error.  If  laboratory  techniques  are  not  so  precise  that  they  can  measure  the  true  value  of  a  water 
quality  parameter,  but  instead  measure  with  some  error,  then  the  statistical  analysis  should  reflect 
this  measurement  error.  Such  error  might  tend  to  inflate  standard  errors.  No  attempt  was  made 
to  incorporate  measurement  error  in  our  analyses. 

We  should  also  note  that  this  project  suggests  several  interesting  topics  for  statistical  research  and 
application.  As  we  have  hinted  earlier,  we  believe  that  a  natural  approach  to  modeling  these  water 
quality  data  would  be  multi-dimensional  and  time-dependent;  perhaps  further  research  could  make 
these  irregularly  observed,  small-sample  time  series  amenable  to  analysis.  One  topic  arising  in  this 
project  is  that  of  determining  when  a  change  (presumably  due  to  human  impact)  has  occurred  in 
the  distribution  of  some  water  quality  parameter;  this  seems  like  a  natural  area  for  the  application 
of  diange-point  detection.,  a  currently  active  topic  of  statistical  research  which  seeks  to  answer  the 
question,  "How  much  data  is  needed  for  us  to  decide  with  some  degree  of  confidence  that  a  cliango 
has  occurred  in  the  distribution  of  the  data?" 

Overall,  this  project  has  suggested  that  despite  serious  problems  with  the  historical  data  record, 
the  establishment  of  water  quality  standards  based  on  ambient  conditions  is  possible,  and  the  poten- 
tial for  productive  interaction  between  statisticians  and  water  quality  experts  in  the  establishnicnl 
of  these  standards  is  great. 


33 


6.  References 

Brockwell,  P.J.  and  R.A.  Davis,  "Time  Series:  Theory  and  Metliods,"  Springer- Vt-rlag,  New  York,  l'JS7. 

David,  li.A.,  "Order  Statistics,"  Wiley,  New  York,  1970. 

Gibl)ons,  J.D.,  "Nonparamctric  Statistical  Inference,"  McGraw-Hill,  New  York,  197]. 

Lehmann,  E.L.,  "Nonparametrics;  Statistical  Methods  Based  on  Ranks,"  Holden-Day,  San  Francisco,  1973. 

Mood,  A.M.,  F.A.  Graybill,  and  D.C.  Boes,  "Introduction  to  the  Theory  of  Statistics,"  McGraw-Hill,  Now  York, 

1974. 
SAS  Institute  Inc.,  "SAS/GRAPH  Guide  for  Personal  Computers,  Version  6  Edition,"  SAS  Institute  Inc.,  Gary, 

NG,  1987. 
SAS  Institute  Inc.,  "SAS  Language  Guide  for  Personal  Computers,  Version  6  Edition,"  SAS  Institute  Inc.,  Carv, 

NC,  1985. 
SAS  Institute  Inc.,  "SAS  Procedures  Guide  for  Personal  Computers,  Version  6  Edition,"  SAS  Institute  Inc.,  Car>', 

NC,  1985. 
SAS  Institute  Inc.,  "SAS/STAT  Guide  for  Personal  Computers,  Version  6  Edition,"  SAS  Institute  Inc.,  Car}',  NC, 

1987. 
United  States  Geological  Survey,  Methods  for  determination  of  inorganic  substances  in  water  and  Jlitvial  sedi- 
ments, Chapter  Al,  in  "Book  5,  Techniques  of  Water  Resources  Investigation  of  the  United  States  Geological 

Survey,"  U.S.  Government  Printing  OfTice,  Washington  D.C 


34 


